15 research outputs found
DAC: Detector-Agnostic Spatial Covariances for Deep Local Features
Current deep visual local feature detectors do not model the spatial
uncertainty of detected features, producing suboptimal results in downstream
applications. In this work, we propose two post-hoc covariance estimates that
can be plugged into any pretrained deep feature detector: a simple, isotropic
covariance estimate that uses the predicted score at a given pixel location,
and a full covariance estimate via the local structure tensor of the learned
score maps. Both methods are easy to implement and can be applied to any deep
feature detector. We show that these covariances are directly related to errors
in feature matching, leading to improvements in downstream tasks, including
solving the perspective-n-point problem and motion-only bundle adjustment. Code
is available at https://github.com/javrtg/DA
Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval
Uncertainty quantification in image retrieval is crucial for downstream
decisions, yet it remains a challenging and largely unexplored problem. Current
methods for estimating uncertainties are poorly calibrated, computationally
expensive, or based on heuristics. We present a new method that views image
embeddings as stochastic features rather than deterministic features. Our two
main contributions are (1) a likelihood that matches the triplet constraint and
that evaluates the probability of an anchor being closer to a positive than a
negative; and (2) a prior over the feature space that justifies the
conventional l2 normalization. To ensure computational efficiency, we derive a
variational approximation of the posterior, called the Bayesian triplet loss,
that produces state-of-the-art uncertainty estimates and matches the predictive
performance of current state-of-the-art methods
Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs
Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such
as floaters or flawed geometry when rendered outside the camera trajectory.
Existing evaluation protocols often do not capture these effects, since they
usually only assess image quality at every 8th frame of the training capture.
To push forward progress in novel-view synthesis, we propose a new dataset and
evaluation procedure, where two camera trajectories are recorded of the scene:
one used for training, and the other for evaluation. In this more challenging
in-the-wild setting, we find that existing hand-crafted regularizers do not
remove floaters nor improve scene geometry. Thus, we propose a 3D
diffusion-based method that leverages local 3D priors and a novel density-based
score distillation sampling loss to discourage artifacts during NeRF
optimization. We show that this data-driven prior removes floaters and improves
scene geometry for casual captures.Comment: ICCV 2023, project page: https://ethanweber.me/nerfbuster
K-Planes: Explicit Radiance Fields in Space, Time, and Appearance
We introduce k-planes, a white-box model for radiance fields in arbitrary
dimensions. Our model uses d choose 2 planes to represent a d-dimensional
scene, providing a seamless way to go from static (d=3) to dynamic (d=4)
scenes. This planar factorization makes adding dimension-specific priors easy,
e.g. temporal smoothness and multi-resolution spatial structure, and induces a
natural decomposition of static and dynamic components of a scene. We use a
linear feature decoder with a learned color basis that yields similar
performance as a nonlinear black-box MLP decoder. Across a range of synthetic
and real, static and dynamic, fixed and varying appearance scenes, k-planes
yields competitive and often state-of-the-art reconstruction fidelity with low
memory usage, achieving 1000x compression over a full 4D grid, and fast
optimization with a pure PyTorch implementation. For video results and code,
please see https://sarafridov.github.io/K-Planes.Comment: Project page https://sarafridov.github.io/K-Planes
Probabilistic Spatial Transformers for Bayesian Data Augmentation
High-capacity models require vast amounts of data, and data augmentation is a
common remedy when this resource is limited. Standard augmentation techniques
apply small hand-tuned transformations to existing data, which is a brittle
process that realistically only allows for simple transformations. We propose a
Bayesian interpretation of data augmentation where the transformations are
modelled as latent variables to be marginalized, and show how these can be
inferred variationally in an end-to-end fashion. This allows for significantly
more complex transformations than manual tuning, and the marginalization
implies a form of test-time data augmentation. The resulting model can be
interpreted as a probabilistic extension of spatial transformer networks.
Experimentally, we demonstrate improvements in accuracy and uncertainty
quantification in image and time series classification tasks.Comment: Submitted to the International Conference on Machine Learning (ICML),
202
Learning to Taste: A Multimodal Wine Dataset
We present WineSensed, a large multimodal wine dataset for studying the
relations between visual perception, language, and flavor. The dataset
encompasses 897k images of wine labels and 824k reviews of wines curated from
the Vivino platform. It has over 350k unique vintages, annotated with year,
region, rating, alcohol percentage, price, and grape composition. We obtained
fine-grained flavor annotations on a subset by conducting a wine-tasting
experiment with 256 participants who were asked to rank wines based on their
similarity in flavor, resulting in more than 5k pairwise flavor distances. We
propose a low-dimensional concept embedding algorithm that combines human
experience with automatic machine similarity kernels. We demonstrate that this
shared concept embedding space improves upon separate embedding spaces for
coarse flavor classification (alcohol percentage, country, grape, price,
rating) and aligns with the intricate human perception of flavor.Comment: Accepted to NeurIPS 2023. See project page:
https://thoranna.github.io/learning_to_taste
Expression of Transketolase like gene 1 (TKTL1) predicts disease-free survival in patients with locally advanced rectal cancer receiving neoadjuvant chemoradiotherapy
<p>Abstract</p> <p>Background</p> <p>For patients with locally advanced rectal cancer (LARC) neoadjuvant chemoradiotherapy is recommended as standard therapy. So far, no predictive or prognostic molecular factors for patients undergoing multimodal treatment are established. Increased angiogenesis and altered tumour metabolism as adaption to hypoxic conditions in cancers play an important role in tumour progression and metastasis. Enhanced expression of Vascular-endothelial-growth-factor-receptor <it>(VEGF-R</it>) and Transketolase-like-1 (<it>TKTL1</it>) are related to hypoxic conditions in tumours. In search for potential prognostic molecular markers we investigated the expression of <it>VEGFR-1</it>, <it>VEGFR-2 </it>and <it>TKTL1 </it>in patients with LARC treated with neoadjuvant chemoradiotherapy and cetuximab.</p> <p>Methods</p> <p>Tumour and corresponding normal tissue from pre-therapeutic biopsies of 33 patients (m: 23, f: 10; median age: 61 years) with LARC treated in phase-I and II trials with neoadjuvant chemoradiotherapy (cetuximab, irinotecan, capecitabine in combination with radiotherapy) were analysed by quantitative PCR.</p> <p>Results</p> <p>Significantly higher expression of <it>VEGFR-1/2 </it>was found in tumour tissue in pre-treatment biopsies as well as in resected specimen after neoadjuvant chemoradiotherapy compared to corresponding normal tissue. High <it>TKTL1 </it>expression significantly correlated with disease free survival. None of the markers had influence on early response parameters such as tumour regression grading. There was no correlation of gene expression between the investigated markers.</p> <p>Conclusion</p> <p>High <it>TKTL-1 </it>expression correlates with poor prognosis in terms of 3 year disease-free survival in patients with LARC treated with intensified neoadjuvant chemoradiotherapy and may therefore serve as a molecular prognostic marker which should be further evaluated in randomised clinical trials.</p
SparseFormer: Attention-based Depth Completion Network
Most pipelines for Augmented and Virtual Reality estimate the ego-motion of
the camera by creating a map of sparse 3D landmarks. In this paper, we tackle
the problem of depth completion, that is, densifying this sparse 3D map using
RGB images as guidance. This remains a challenging problem due to the low
density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM
pipelines. We introduce a transformer block, SparseFormer, that fuses 3D
landmarks with deep visual features to produce dense depth. The SparseFormer
has a global receptive field, making the module especially effective for depth
completion with low-density and non-uniform landmarks. To address the issue of
depth outliers among the 3D landmarks, we introduce a trainable refinement
module that filters outliers through attention between the sparse landmarks.Comment: Accepted at CV4ARVR 202
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization
Place recognition and visual localization are particularly challenging in
wide baseline configurations. In this paper, we contribute with the
\emph{Danish Airs and Grounds} (DAG) dataset, a large collection of
street-level and aerial images targeting such cases. Its main challenge lies in
the extreme viewing-angle difference between query and reference images with
consequent changes in illumination and perspective. The dataset is larger and
more diverse than current publicly available data, including more than 50 km of
road in urban, suburban and rural areas. All images are associated with
accurate 6-DoF metadata that allows the benchmarking of visual localization
methods.
We also propose a map-to-image re-localization pipeline, that first estimates
a dense 3D reconstruction from the aerial images and then matches query
street-level images to street-level renderings of the 3D model. The dataset can
be downloaded at: https://frederikwarburg.github.io/DAGComment: Submitted to RA-L (IROS